Object Detection with YOLOv8: A Practical Application¶
This notebook demonstrates a complete object detection workflow using YOLOv8, one of the most practical and efficient models for real-world applications.
What you'll learn:
- Using pre-trained YOLOv8 for immediate object detection
- Fine-tuning on a custom dataset (pedestrian detection)
- Evaluating and visualizing results
Why YOLOv8? Fast, accurate, easy to use, and excellent for deployment.
In [1]:
# Installation
!pip install ultralytics opencv-python matplotlib pillow
Collecting ultralytics Downloading ultralytics-8.3.213-py3-none-any.whl.metadata (37 kB) Requirement already satisfied: opencv-python in /usr/local/lib/python3.12/dist-packages (4.12.0.88) Requirement already satisfied: matplotlib in /usr/local/lib/python3.12/dist-packages (3.10.0) Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (11.3.0) Requirement already satisfied: numpy>=1.23.0 in /usr/local/lib/python3.12/dist-packages (from ultralytics) (2.0.2) Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.12/dist-packages (from ultralytics) (6.0.3) Requirement already satisfied: requests>=2.23.0 in /usr/local/lib/python3.12/dist-packages (from ultralytics) (2.32.4) Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.12/dist-packages (from ultralytics) (1.16.2) Requirement already satisfied: torch>=1.8.0 in /usr/local/lib/python3.12/dist-packages (from ultralytics) (2.8.0+cu126) Requirement already satisfied: torchvision>=0.9.0 in /usr/local/lib/python3.12/dist-packages (from ultralytics) (0.23.0+cu126) Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from ultralytics) (5.9.5) Requirement already satisfied: polars in /usr/local/lib/python3.12/dist-packages (from ultralytics) (1.25.2) Collecting ultralytics-thop>=2.0.0 (from ultralytics) Downloading ultralytics_thop-2.0.17-py3-none-any.whl.metadata (14 kB) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (1.3.3) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (4.60.1) Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (1.4.9) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (25.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (3.2.5) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.12/dist-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.7->matplotlib) (1.17.0) Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests>=2.23.0->ultralytics) (3.4.3) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests>=2.23.0->ultralytics) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests>=2.23.0->ultralytics) (2.5.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests>=2.23.0->ultralytics) (2025.10.5) Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (3.20.0) Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (4.15.0) Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (75.2.0) Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (1.13.3) Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (3.5) Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (3.1.6) Requirement already satisfied: fsspec in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (2025.3.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (12.6.77) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (12.6.77) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (12.6.80) Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (9.10.2.21) Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (12.6.4.1) Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (11.3.0.4) Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (10.3.7.77) Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (11.7.1.2) Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (12.5.4.2) Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (0.7.1) Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (2.27.3) Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (12.6.77) Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (12.6.85) Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (1.11.1.6) Requirement already satisfied: triton==3.4.0 in /usr/local/lib/python3.12/dist-packages (from torch>=1.8.0->ultralytics) (3.4.0) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch>=1.8.0->ultralytics) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=1.8.0->ultralytics) (3.0.3) Downloading ultralytics-8.3.213-py3-none-any.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 49.6 MB/s eta 0:00:00 Downloading ultralytics_thop-2.0.17-py3-none-any.whl (28 kB) Installing collected packages: ultralytics-thop, ultralytics Successfully installed ultralytics-8.3.213 ultralytics-thop-2.0.17
In [2]:
from ultralytics import YOLO
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import urllib.request
import os
print('Environment ready!')
Creating new Ultralytics Settings v0.0.6 file ✅ View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json' Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings. Environment ready!
Part 1: Quick Start - Pre-trained Detection¶
Let's start by using YOLOv8 pre-trained on COCO dataset (80 common object classes).
In [3]:
# Load pre-trained YOLOv8
model = YOLO('yolov8n.pt') # n = nano (fastest), also available: s, m, l, x
print('YOLOv8 model loaded!')
print(f'Model can detect {len(model.names)} classes')
print(f'Classes: {list(model.names.values())[:10]}...') # Show first 10
Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt': 100% ━━━━━━━━━━━━ 6.2MB 282.8MB/s 0.0s YOLOv8 model loaded! Model can detect 80 classes Classes: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light']...
In [4]:
# Download sample images
sample_urls = [
'http://images.cocodataset.org/val2017/000000039769.jpg', # Cats
'http://images.cocodataset.org/val2017/000000397133.jpg', # Sports
'http://images.cocodataset.org/val2017/000000037777.jpg', # Traffic
]
os.makedirs('samples', exist_ok=True)
image_paths = []
for i, url in enumerate(sample_urls):
try:
path = f'samples/image_{i}.jpg'
urllib.request.urlretrieve(url, path)
image_paths.append(path)
print(f'Downloaded image {i+1}')
except:
print(f'Failed to download image {i+1}')
print(f'\nReady to detect on {len(image_paths)} images!')
Downloaded image 1 Downloaded image 2 Downloaded image 3 Ready to detect on 3 images!
In [5]:
# Run detection on all images
results = model(image_paths, conf=0.5) # conf = confidence threshold
# Visualize results
fig, axes = plt.subplots(len(results), 2, figsize=(16, 6*len(results)))
if len(results) == 1:
axes = axes.reshape(1, -1)
for idx, (result, path) in enumerate(zip(results, image_paths)):
# Original image
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
axes[idx, 0].imshow(img)
axes[idx, 0].set_title('Original', fontsize=14)
axes[idx, 0].axis('off')
# Detection result
result_img = result.plot() # YOLOv8 draws boxes automatically
axes[idx, 1].imshow(result_img)
axes[idx, 1].set_title(f'Detected: {len(result.boxes)} objects', fontsize=14)
axes[idx, 1].axis('off')
# Print detected objects
print(f'\nImage {idx+1}:')
for box in result.boxes:
cls = int(box.cls[0])
conf = float(box.conf[0])
print(f' {model.names[cls]}: {conf:.3f}')
plt.tight_layout()
plt.show()
0: 640x640 2 cats, 1 remote, 29.7ms 1: 640x640 1 person, 3 bowls, 1 oven, 29.7ms 2: 640x640 1 dining table, 1 oven, 1 refrigerator, 29.7ms Speed: 10.5ms preprocess, 29.7ms inference, 125.0ms postprocess per image at shape (1, 3, 640, 640) Image 1: cat: 0.868 cat: 0.831 remote: 0.830 Image 2: person: 0.894 bowl: 0.747 bowl: 0.745 bowl: 0.725 oven: 0.563 Image 3: refrigerator: 0.924 oven: 0.897 dining table: 0.690
Part 2: Fine-tuning on Custom Dataset¶
Now let's fine-tune YOLOv8 on a custom dataset for pedestrian detection using the Penn-Fudan dataset.
In [6]:
# Download Penn-Fudan dataset
!wget -q https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip
!unzip -q PennFudanPed.zip
print('Dataset downloaded!')
Dataset downloaded!
In [7]:
# Prepare dataset in YOLO format
import shutil
from pathlib import Path
# Create directory structure
dataset_root = Path('pedestrian_dataset')
for split in ['train', 'val']:
(dataset_root / split / 'images').mkdir(parents=True, exist_ok=True)
(dataset_root / split / 'labels').mkdir(parents=True, exist_ok=True)
# Convert masks to YOLO format bounding boxes
from PIL import Image
import numpy as np
def mask_to_bbox(mask_path):
"""Convert mask to YOLO format: class x_center y_center width height (normalized)"""
mask = np.array(Image.open(mask_path))
h, w = mask.shape
bboxes = []
obj_ids = np.unique(mask)[1:] # Skip background
for obj_id in obj_ids:
pos = np.where(mask == obj_id)
xmin, xmax = np.min(pos[1]), np.max(pos[1])
ymin, ymax = np.min(pos[0]), np.max(pos[0])
# Convert to YOLO format (normalized)
x_center = ((xmin + xmax) / 2) / w
y_center = ((ymin + ymax) / 2) / h
width = (xmax - xmin) / w
height = (ymax - ymin) / h
bboxes.append(f'0 {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}')
return bboxes
# Process all images
img_dir = Path('PennFudanPed/PNGImages')
mask_dir = Path('PennFudanPed/PedMasks')
images = sorted(list(img_dir.glob('*.png')))
# Split train/val (80/20)
split_idx = int(0.8 * len(images))
train_images = images[:split_idx]
val_images = images[split_idx:]
for split, img_list in [('train', train_images), ('val', val_images)]:
for img_path in img_list:
# Copy image
shutil.copy(img_path, dataset_root / split / 'images' / img_path.name)
# Create label file
mask_path = mask_dir / img_path.name.replace('.png', '_mask.png')
bboxes = mask_to_bbox(mask_path)
label_path = dataset_root / split / 'labels' / img_path.name.replace('.png', '.txt')
with open(label_path, 'w') as f:
f.write('\n'.join(bboxes))
print(f'Dataset prepared:')
print(f' Train: {len(train_images)} images')
print(f' Val: {len(val_images)} images')
Dataset prepared: Train: 136 images Val: 34 images
In [8]:
# Create dataset config file
config = f"""
path: {dataset_root.absolute()}
train: train/images
val: val/images
nc: 1
names: ['pedestrian']
"""
with open('pedestrian.yaml', 'w') as f:
f.write(config)
print('Config file created!')
Config file created!
In [9]:
# Visualize a sample from training data
sample_img = train_images[0]
sample_label = dataset_root / 'train' / 'labels' / sample_img.name.replace('.png', '.txt')
# Read image
img = cv2.imread(str(dataset_root / 'train' / 'images' / sample_img.name))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
h, w = img.shape[:2]
# Read and draw boxes
with open(sample_label) as f:
for line in f:
cls, x_c, y_c, width, height = map(float, line.strip().split())
# Convert back to pixel coordinates
x_c, y_c, width, height = x_c * w, y_c * h, width * w, height * h
x1 = int(x_c - width/2)
y1 = int(y_c - height/2)
x2 = int(x_c + width/2)
y2 = int(y_c + height/2)
cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
plt.figure(figsize=(10, 8))
plt.imshow(img)
plt.title('Sample Training Image with Annotations')
plt.axis('off')
plt.show()
In [10]:
# Fine-tune YOLOv8
model = YOLO('yolov8n.pt') # Start from pre-trained weights
# Train
results = model.train(
data='pedestrian.yaml',
epochs=20,
imgsz=640,
batch=8,
name='pedestrian_detector',
patience=5, # Early stopping
save=True,
verbose=True
)
print('Training complete!')
Ultralytics 8.3.213 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (NVIDIA A100-SXM4-40GB, 40507MiB) engine/trainer: agnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=pedestrian.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=20, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=pedestrian_detector, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=5, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=/content/runs/detect/pedestrian_detector, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=True, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf': 100% ━━━━━━━━━━━━ 755.1KB 104.8MB/s 0.0s Overriding model.yaml nc=80 with nc=1 from n params module arguments 0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2] 1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2] 2 -1 1 7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True] 3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2] 4 -1 2 49664 ultralytics.nn.modules.block.C2f [64, 64, 2, True] 5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2] 6 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True] 7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2] 8 -1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True] 9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5] 10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1] 12 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1] 13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1] 15 -1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1] 16 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2] 17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1] 18 -1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1] 19 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1] 21 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1] 22 [15, 18, 21] 1 751507 ultralytics.nn.modules.head.Detect [1, [64, 128, 256]] Model summary: 129 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs Transferred 319/355 items from pretrained weights Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks... Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt to 'yolo11n.pt': 100% ━━━━━━━━━━━━ 5.4MB 337.6MB/s 0.0s AMP: checks passed ✅ train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3001.8±541.4 MB/s, size: 281.5 KB) train: Scanning /content/pedestrian_dataset/train/labels... 136 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 136/136 478.2it/s 0.3s train: New cache created: /content/pedestrian_dataset/train/labels.cache albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, method='weighted_average', num_output_channels=3), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8)) val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 2156.2±1363.6 MB/s, size: 240.6 KB) val: Scanning /content/pedestrian_dataset/val/labels... 34 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 34/34 440.1it/s 0.1s val: New cache created: /content/pedestrian_dataset/val/labels.cache Plotting labels to /content/runs/detect/pedestrian_detector/labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: AdamW(lr=0.002, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0) Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to /content/runs/detect/pedestrian_detector Starting training for 20 epochs... Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 1/20 1.14G 1.017 2.003 1.182 40 640: 100% ━━━━━━━━━━━━ 17/17 4.9it/s 3.5s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 2.4it/s 1.2s all 34 69 1 0.378 0.954 0.716 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 2/20 1.35G 0.8843 1.117 1.078 37 640: 100% ━━━━━━━━━━━━ 17/17 13.5it/s 1.3s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.2it/s 0.2s all 34 69 0.941 0.681 0.807 0.599 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 3/20 1.36G 0.9591 1.138 1.111 38 640: 100% ━━━━━━━━━━━━ 17/17 14.6it/s 1.2s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 17.3it/s 0.2s all 34 69 0.937 0.623 0.741 0.526 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 4/20 1.38G 0.9491 1.08 1.095 55 640: 100% ━━━━━━━━━━━━ 17/17 15.1it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 15.6it/s 0.2s all 34 69 0.94 0.551 0.748 0.506 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 5/20 1.4G 0.9086 1.077 1.106 49 640: 100% ━━━━━━━━━━━━ 17/17 14.4it/s 1.2s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.1it/s 0.2s all 34 69 0.968 0.884 0.952 0.72 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 6/20 1.41G 0.9041 1.05 1.103 35 640: 100% ━━━━━━━━━━━━ 17/17 14.1it/s 1.2s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 17.3it/s 0.2s all 34 69 0.934 0.822 0.951 0.676 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 7/20 1.43G 0.9277 1.029 1.093 49 640: 100% ━━━━━━━━━━━━ 17/17 14.6it/s 1.2s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 17.3it/s 0.2s all 34 69 0.836 0.87 0.906 0.621 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 8/20 1.45G 0.922 0.9991 1.096 40 640: 100% ━━━━━━━━━━━━ 17/17 15.1it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 15.3it/s 0.2s all 34 69 0.81 0.863 0.908 0.697 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 9/20 1.46G 0.9551 0.9937 1.135 48 640: 100% ━━━━━━━━━━━━ 17/17 14.4it/s 1.2s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 15.7it/s 0.2s all 34 69 0.966 0.828 0.951 0.7 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 10/20 1.48G 0.853 0.9065 1.068 47 640: 100% ━━━━━━━━━━━━ 17/17 14.4it/s 1.2s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 17.4it/s 0.2s all 34 69 0.928 0.941 0.975 0.747 Closing dataloader mosaic albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, method='weighted_average', num_output_channels=3), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8)) Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 11/20 1.5G 0.7373 1.136 1.03 24 640: 100% ━━━━━━━━━━━━ 17/17 9.8it/s 1.7s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 15.2it/s 0.2s all 34 69 1 0.881 0.968 0.756 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 12/20 1.52G 0.6828 0.9378 0.9684 17 640: 100% ━━━━━━━━━━━━ 17/17 14.3it/s 1.2s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 15.2it/s 0.2s all 34 69 0.971 0.959 0.99 0.775 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 13/20 1.53G 0.6926 0.9309 0.9717 26 640: 100% ━━━━━━━━━━━━ 17/17 15.6it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.0it/s 0.2s all 34 69 0.97 0.986 0.99 0.742 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 14/20 1.55G 0.614 0.8391 0.9366 25 640: 100% ━━━━━━━━━━━━ 17/17 15.2it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 15.8it/s 0.2s all 34 69 0.985 0.958 0.988 0.765 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 15/20 1.57G 0.6125 0.8295 0.931 17 640: 100% ━━━━━━━━━━━━ 17/17 15.4it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.4it/s 0.2s all 34 69 0.998 0.957 0.989 0.803 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 16/20 1.59G 0.6445 0.8344 0.9759 14 640: 100% ━━━━━━━━━━━━ 17/17 15.6it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.4it/s 0.2s all 34 69 0.971 0.985 0.985 0.793 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 17/20 1.61G 0.6331 0.8131 0.9616 19 640: 100% ━━━━━━━━━━━━ 17/17 15.2it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.1it/s 0.2s all 34 69 0.985 0.954 0.985 0.792 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 18/20 1.62G 0.5942 0.7888 0.9376 22 640: 100% ━━━━━━━━━━━━ 17/17 15.4it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.7it/s 0.2s all 34 69 0.968 0.957 0.989 0.811 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 19/20 1.64G 0.5801 0.7577 0.9168 16 640: 100% ━━━━━━━━━━━━ 17/17 15.1it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 17.4it/s 0.2s all 34 69 0.984 0.957 0.993 0.828 Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 20/20 1.66G 0.5697 0.7572 0.9371 29 640: 100% ━━━━━━━━━━━━ 17/17 15.1it/s 1.1s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 16.1it/s 0.2s all 34 69 1 0.952 0.993 0.819 20 epochs completed in 0.010 hours. Optimizer stripped from /content/runs/detect/pedestrian_detector/weights/last.pt, 6.2MB Optimizer stripped from /content/runs/detect/pedestrian_detector/weights/best.pt, 6.2MB Validating /content/runs/detect/pedestrian_detector/weights/best.pt... Ultralytics 8.3.213 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (NVIDIA A100-SXM4-40GB, 40507MiB) Model summary (fused): 72 layers, 3,005,843 parameters, 0 gradients, 8.1 GFLOPs Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 14.5it/s 0.2s all 34 69 0.984 0.957 0.993 0.828 Speed: 0.1ms preprocess, 0.9ms inference, 0.0ms loss, 1.7ms postprocess per image Results saved to /content/runs/detect/pedestrian_detector Training complete!
Part 3: Evaluation and Results¶
In [11]:
# Load best model
model = YOLO('runs/detect/pedestrian_detector/weights/best.pt')
# Evaluate on validation set
metrics = model.val(data='pedestrian.yaml')
print('\nValidation Metrics:')
print(f' mAP50: {metrics.box.map50:.3f}')
print(f' mAP50-95: {metrics.box.map:.3f}')
print(f' Precision: {metrics.box.mp:.3f}')
print(f' Recall: {metrics.box.mr:.3f}')
Ultralytics 8.3.213 🚀 Python-3.12.11 torch-2.8.0+cu126 CUDA:0 (NVIDIA A100-SXM4-40GB, 40507MiB) Model summary (fused): 72 layers, 3,005,843 parameters, 0 gradients, 8.1 GFLOPs val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3944.2±859.8 MB/s, size: 346.1 KB) val: Scanning /content/pedestrian_dataset/val/labels.cache... 34 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 34/34 41.9Kit/s 0.0s Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 3/3 1.9it/s 1.6s all 34 69 0.984 0.957 0.993 0.825 Speed: 1.8ms preprocess, 20.4ms inference, 0.0ms loss, 4.3ms postprocess per image Results saved to /content/runs/detect/val Validation Metrics: mAP50: 0.993 mAP50-95: 0.825 Precision: 0.984 Recall: 0.957
In [12]:
# Test on validation images
test_images = list((dataset_root / 'val' / 'images').glob('*.png'))[:6]
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flat
for ax, img_path in zip(axes, test_images):
# Run detection
result = model(str(img_path), conf=0.5)[0]
# Visualize
result_img = result.plot()
ax.imshow(result_img)
ax.set_title(f'Detected: {len(result.boxes)} pedestrians', fontsize=12)
ax.axis('off')
plt.tight_layout()
plt.show()
image 1/1 /content/pedestrian_dataset/val/images/PennPed00082.png: 544x640 1 pedestrian, 109.0ms Speed: 2.4ms preprocess, 109.0ms inference, 9.7ms postprocess per image at shape (1, 3, 544, 640) image 1/1 /content/pedestrian_dataset/val/images/PennPed00079.png: 608x640 1 pedestrian, 65.6ms Speed: 2.3ms preprocess, 65.6ms inference, 1.4ms postprocess per image at shape (1, 3, 608, 640) image 1/1 /content/pedestrian_dataset/val/images/PennPed00077.png: 608x640 1 pedestrian, 6.8ms Speed: 2.2ms preprocess, 6.8ms inference, 1.3ms postprocess per image at shape (1, 3, 608, 640) image 1/1 /content/pedestrian_dataset/val/images/PennPed00074.png: 512x640 2 pedestrians, 63.1ms Speed: 1.9ms preprocess, 63.1ms inference, 1.4ms postprocess per image at shape (1, 3, 512, 640) image 1/1 /content/pedestrian_dataset/val/images/PennPed00093.png: 544x640 1 pedestrian, 8.2ms Speed: 2.3ms preprocess, 8.2ms inference, 1.3ms postprocess per image at shape (1, 3, 544, 640) image 1/1 /content/pedestrian_dataset/val/images/PennPed00065.png: 640x640 1 pedestrian, 7.4ms Speed: 2.5ms preprocess, 7.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)
In [13]:
# Visualize training curves
from IPython.display import Image as IPImage, display
print('Training Results:')
display(IPImage('runs/detect/pedestrian_detector/results.png'))
Training Results: